Fine-grained Arabic named entity recognition
نویسنده
چکیده
Named Entity Recognition (NER) is a Natural Language Processing (NLP) task, which aims to extract useful information from unstructured textual data by detecting and classifying Named Entity (NE) phrases into predefined semantic classes. This thesis addresses the problem of fine-grained NER for Arabic, which poses unique linguistic challenges to NER; such as the absence of capitalisation and short vowels, the complex morphology, and the highly inflection process. Instead of classifying the detected NE phrases into small sets of classes (i.e. coarsegrained ranged from 3 to 10); we target a broader range (i.e. 50 fine-grained classes ‘hierarchal-based of two levels’) to increase the depth of the semantic knowledge extracted. This has increased the number of classes, complicating the task, when compared with traditional (coarse-grained) NER, because of the increase in the number of semantic classes and the decrease in semantic differences between fine-grained classes. Fine-grained NER is advantageous in various NLP tasks, including Information Extraction, Ontology Construction and Populations, and Question Answering among many others. Our approach to developing fine-grained NER relies on two different supervised Machine Learning (ML) technologies (i.e. Maximum Entropy ‘ME’ and Conditional Random Fields ‘CRF’), which require annotated (i.e. labelled) training data (i.e. a corpus) in order to learn by extracting informative features. Therefore, the development of such resources comprises one of the thesis contributions. We develop a methodology which exploit the richness of Arabic Wikipedia (AW) in order to create a scalable fine-grained lexical resource (gazetteer) and a corpus automatically. Moreover, two gold-standard cre-
منابع مشابه
Assessing the Challenge of Fine-Grained Named Entity Recognition and Classification
Named Entity Recognition and Classification (NERC) is a well-studied NLP task typically focused on coarse-grained named entity (NE) classes. NERC for more fine-grained semantic NE classes has not been systematically studied. This paper quantifies the difficulty of fine-grained NERC (FG-NERC) when performed at large scale on the people domain. We apply unsupervised acquisition methods to constru...
متن کاملتشخیص اسامی اشخاص با استفاده از تزریق کلمههای نامزد اسم در میدانهای تصادفی شرطی برای زبان عربی
Named Entity Recognition and Extraction are very important tasks for discovering proper names including persons, locations, date, and time, inside electronic textual resources. Accurate named entity recognition system is an essential utility to resolve fundamental problems in question answering systems, summary extraction, information retrieval and extraction, machine translation, video interpr...
متن کاملFine-Grained Named Entity Recognition Using Conditional Random Fields for Question Answering
In many QA systems, fine-grained named entities are extracted by coarse-grained named entity recognizer and fine-grained named entity dictionary. In this paper, we describe a fine-grained Named Entity Recognition using Conditional Random Fields (CRFs) for question answering. We used CRFs to detect boundary of named entities and Maximum Entropy (ME) to classify named entity classes. Using the pr...
متن کاملName Translation based on Fine-grained Named Entity Recognition in a Single Language
We propose named entity abstraction methods with fine-grained named entity labels for improving statistical machine translation (SMT). The methods are based on a bilingual named entity recognizer that uses a monolingual named entity recognizer with transliteration. Through experiments, we demonstrate that incorporating fine-grained named entities into statistical machine translation improves th...
متن کاملA Hybrid Approach to Features Representation for Fine-grained Arabic Named Entity Recognition
Despite considerable research on the topic of Arabic Named Entity Recognition (NER), almost all efforts focus on a traditional set of semantic classes, features and token representations. In this work, we advance previous research in a systematic manner and devise a novel method to represent these features, relying on a dependency-based structure to capture further evidence within the sentence....
متن کامل